Current Issue : October-December Volume : 2022 Issue Number : 4 Articles : 5 Articles
In order to improve the music analysis technology, this paper studies the music analysis technology combined with the spectrum analysis technology and builds an intelligent audio analysis model. In this paper, the nonlinear theoretical method is adopted, and the motion equation of the audio frequency is obtained through variable processing, so as to obtain the mean square fluctuation of the two sound wave models and then obtain the entanglement and compression. Moreover, this paper introduces the combined mode method to describe the interaction of the two laser fields and atomic matter and verifies that both the differential modes are decoupled from the interaction and only the sum mode participates in the interaction. The experiment verifies that the music audio analysis system based on spectrum analysis technology proposed in this paper can play an important role in music analysis....
In order to solve the problem of lack of multimodal emotional database, a computer speech recognition technology and graphic form design research were proposed. Using the hidden Markov model (HMM) to recognize speech emotion is mainly to provide emotion type for the subsequent expression fusion. When the emotion category in the speech is obtained, the facial animation parameter (FAP) corresponding to the speech emotion can be combined with the lip movement FAP based on the Moving Pictures Experts Group (MPEG-4) face animation standard to obtain a comprehensive FAP. The results show that the recognition effect of speech emotion recognition in the method used in this paper is relatively better than that obtained by other literature methods, and the average recognition rate reaches 70.24%, which is higher than the other three methods. It is verified that this method can present a better recognition effect....
Large-scale automatic speech recognition model has achieved impressive performance. However, huge computational resources and massive amount of data are required to train an ASR model. Knowledge distillation is a prevalent model compression method which transfers the knowledge from large model to small model. To improve the efficiency of knowledge distillation for end-to-end speech recognition especially in the low-resource setting, a Mixup-based Knowledge Distillation (MKD) method is proposed which combines Mixup, a data-agnostic data augmentation method, with softmax-level knowledge distillation. A loss-level mixture is presented to address the problem caused by the non-linearity of label in the KL-divergence when adopting Mixup to the teacher–student framework. It is mathematically shown that optimizing the mixture of loss function is equivalent to optimize an upper bound of the original knowledge distillation loss. The proposed MKD takes the advantage of Mixup and brings robustness to the model even with a small amount of training data. The experiments on Aishell-1 show that MKD obtains a 15.6% and 3.3% relative improvement on two student models with different parameter scales compared with the existing methods. Experiments on data efficiency demonstrate MKD achieves similar results with only half of the original dataset....
The speech enhancement effect of traditional deep learning algorithms is not ideal under low signal-to-noise ratios (SNR). Skip connections-deep neural network (Skip-DNN) improves the traditional deep neural network (DNN) by adding skip connections between each layer of the neural network to solve the degradation problem of DNN. In this paper, the Multiresolution Cochleagram (MRCG) features in the gammachirp transform domain are denoised to obtain the improved MRCG (I-MRCG). The noise reduction method adopts the Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE-STSA) and takes I-MRCG as the input feature and Skip-DNN as the training network to improve the speech enhancement effect of the model. This paper also proposes an improved source-to-distortion ratio (SDR) loss function. When the loss function uses the improved SDR, it will improve the performance of Skip-DNN speech enhancement model. The experiments in this paper are performed on the Edinburgh dataset. When using I-MRCG as the input feature of Skip- DNN, the average perceptual evaluation of speech quality (PESQ) is 2.9137, and the average short-time objective intelligibility (STOI) is 0.8515. Compared with MRCG as Skip-DNN input features, the improvements are 0.91% and 0.71%, respectively. When the improved SDR is used as the loss function of the speech model, the average PESQ is 2.9699 and the average STOI is 0.8547. Compared with other loss functions, the improved SDR has a better enhancement effect when used as the loss function of the speech enhancement model....
Theintegration and development of music curriculum signals have attracted the attention of researchers in real teaching scenarios. Based on the theory of multiple intelligences algorithm, this paper constructs a music curriculum integration and development model. This paper establishes a nonnegative matrix decomposition scheme, adds a constraint term that can reflect the smoothness of the time-varying gain matrix to the divergence-based objective function, and iteratively solves the problem of curriculum integration and development through the minimum optimization algorithm obtained by constructing the auxiliary function. In the process of simulation experiment, the optimal solution of each factor matrix was designed, the source music signal was reconstructed, and the sufficiently sparse source music signal was separated. The experimental results show that, compared with the parameter estimation without preprocessing, the parameter estimation with preprocessing is more accurate, the accuracy of the improved algorithm reaches 97.1%, the signal compression ratio reaches 0.656, and the enhanced signal can obtain at most 6 dB. The signal-to-noise ratio is improved, and the convergence speed is fast and asymptotically reaches the lower bound. It is suitable for parameter estimation of low-order and high-order autoregressive processes and effectively promotes the smooth development of music course signals....
Loading....